Goto

Collaborating Authors

 new distribution


A New Distribution on the Simplex with Auto-Encoding Applications

Neural Information Processing Systems

We construct a new distribution for the simplex using the Kumaraswamy distribution and an ordered stick-breaking process. We explore and develop the theoretical properties of this new distribution and prove that it exhibits symmetry (exchangeability) under the same conditions as the well-known Dirichlet. Like the Dirichlet, the new distribution is adept at capturing sparsity but, unlike the Dirichlet, has an exact and closed form reparameterization--making it well suited for deep variational Bayesian modeling. We demonstrate the distribution's utility in a variety of semi-supervised auto-encoding tasks. In all cases, the resulting models achieve competitive performance commensurate with their simplicity, use of explicit probability models, and abstinence from adversarial training.


Reviews: A New Distribution on the Simplex with Auto-Encoding Applications

Neural Information Processing Systems

Originality: Although VAEs using a stick-breaking construction with Kumaraswamy distributions has been considered before (Nalisnick, Smyth, STICK-BREAKING VARIATIONAL AUTOENCODERS, 2017), the idea to use such a construction and extend it by mixing over the orderings to obtain a density more similar to a Dirichlet is new and interesting. Related work is adequately cited. Quality: The paper seems technically sound and claims are largely supported. Although Theorem 1 is a standard result, reiterating it is likely useful for the subsequent exposition. Experimental results show that the method outperforms some baselines, however, I feel that some additional experiments would be useful (see details below in Section 5. Improvements).


Continual Learning with Strategic Selection and Forgetting for Network Intrusion Detection

Zhang, Xinchen, Zhao, Running, Jiang, Zhihan, Chen, Handi, Ding, Yulong, Ngai, Edith C. H., Yang, Shuang-Hua

arXiv.org Artificial Intelligence

Intrusion Detection Systems (IDS) are crucial for safeguarding digital infrastructure. In dynamic network environments, both threat landscapes and normal operational behaviors are constantly changing, resulting in concept drift. While continuous learning mitigates the adverse effects of concept drift, insufficient attention to drift patterns and excessive preservation of outdated knowledge can still hinder the IDS's adaptability. In this paper, we propose SSF (Strategic Selection and Forgetting), a novel continual learning method for IDS, providing continuous model updates with a constantly refreshed memory buffer. Our approach features a strategic sample selection algorithm to select representative new samples and a strategic forgetting mechanism to drop outdated samples. The proposed strategic sample selection algorithm prioritizes new samples that cause the `drifted' pattern, enabling the model to better understand the evolving landscape. Additionally, we introduce strategic forgetting upon detecting significant drift by discarding outdated samples to free up memory, allowing the incorporation of more recent data. SSF captures evolving patterns effectively and ensures the model is aligned with the change of data patterns, significantly enhancing the IDS's adaptability to concept drift. The state-of-the-art performance of SSF on NSL-KDD and UNSW-NB15 datasets demonstrates its superior adaptability to concept drift for network intrusion detection.


A New Distribution on the Simplex with Auto-Encoding Applications

Neural Information Processing Systems

We construct a new distribution for the simplex using the Kumaraswamy distribution and an ordered stick-breaking process. We explore and develop the theoretical properties of this new distribution and prove that it exhibits symmetry (exchangeability) under the same conditions as the well-known Dirichlet. Like the Dirichlet, the new distribution is adept at capturing sparsity but, unlike the Dirichlet, has an exact and closed form reparameterization--making it well suited for deep variational Bayesian modeling. We demonstrate the distribution's utility in a variety of semi-supervised auto-encoding tasks. In all cases, the resulting models achieve competitive performance commensurate with their simplicity, use of explicit probability models, and abstinence from adversarial training.


MetaRM: Shifted Distributions Alignment via Meta-Learning

Dou, Shihan, Liu, Yan, Zhou, Enyu, Li, Tianlong, Jia, Haoxiang, Xiong, Limao, Zhao, Xin, Ye, Junjie, Zheng, Rui, Gui, Tao, Zhang, Qi, Huang, Xuanjing

arXiv.org Artificial Intelligence

The success of Reinforcement Learning from Human Feedback (RLHF) in language model alignment is critically dependent on the capability of the reward model (RM). However, as the training process progresses, the output distribution of the policy model shifts, leading to the RM's reduced ability to distinguish between responses. This issue is further compounded when the RM, trained on a specific data distribution, struggles to generalize to examples outside of that distribution. These two issues can be united as a challenge posed by the shifted distribution of the environment. To surmount this challenge, we introduce MetaRM, a method leveraging meta-learning to align the RM with the shifted environment distribution. MetaRM is designed to train the RM by minimizing data loss, particularly for data that can improve the differentiation ability to examples of the shifted target distribution. Extensive experiments demonstrate that MetaRM significantly improves the RM's distinguishing ability in iterative RLHF optimization, and also provides the capacity to identify subtle differences in out-of-distribution samples.


LARA: A Light and Anti-overfitting Retraining Approach for Unsupervised Anomaly Detection

Chen, Feiyi, Qin, Zhen, Zhang, Yingying, Deng, Shuiguang, Xiao, Yi, Pang, Guansong, Wen, Qingsong

arXiv.org Artificial Intelligence

Most of current anomaly detection models assume that the normal pattern remains same all the time. However, the normal patterns of Web services change dramatically and frequently. The model trained on old-distribution data is outdated after such changes. Retraining the whole model every time is expensive. Besides, at the beginning of normal pattern changes, there is not enough observation data from the new distribution. Retraining a large neural network model with limited data is vulnerable to overfitting. Thus, we propose a Light and Anti-overfitting Retraining Approach (LARA) for deep variational auto-encoder based time series anomaly detection methods (VAEs). This work aims to make three novel contributions: 1) the retraining process is formulated as a convex problem and can converge at a fast rate as well as prevent overfitting; 2) designing a ruminate block, which leverages the historical data without the need to store them; 3) mathematically proving that when fine-tuning the latent vector and reconstructed data, the linear formations can achieve the least adjusting errors between the ground truths and the fine-tuned ones. Moreover, we have performed many experiments to verify that retraining LARA with even 43 time slots of data from new distribution can result in its competitive F1 Score in comparison with the state-of-the-art anomaly detection models trained with sufficient data. Besides, we verify its light overhead.


Stat Stories: Normalizing Flows as an Application of Variable Transformation

#artificialintelligence

While other statistical methods such as Generative Adversarial Networks (GAN) and Variational AutoEncoders (VAN) have been able to perform dramatic results on difficult tasks such as learning distributions of images, and other complicated datasets, they do not allow evaluation of density estimation and calculation of probability density of new data points. In such a sense, Normalizing Flows proves to be eloquent. The method can perform density estimation and sampling as well as variational inferences. Consider a transformation u g(x; θ), i.e., g is parametrized by parameter vector θ.


The $r$-value: evaluating stability with respect to distributional shifts

Gupta, Suyash, Rothenhäusler, Dominik

arXiv.org Machine Learning

Common statistical measures of uncertainty like $p$-values and confidence intervals quantify the uncertainty due to sampling, that is, the uncertainty due to not observing the full population. In practice, populations change between locations and across time. This makes it difficult to gather knowledge that transfers across data sets. We propose a measure of uncertainty that quantifies the distributional uncertainty of a statistical estimand with respect to Kullback-Liebler divergence, that is, the sensitivity of the parameter under general distributional perturbations within a Kullback-Liebler divergence ball. If the signal-to-noise ratio is small, distributional uncertainty is a monotonous transformation of the signal-to-noise ratio. In general, however, it is a different concept and corresponds to a different research question. Further, we propose measures to estimate the stability of parameters with respect to directional or variable-specific shifts. We also demonstrate how the measure of distributional uncertainty can be used to prioritize data collection for better estimation of statistical parameters under shifted distribution. We evaluate the performance of the proposed measure in simulations and real data and show that it can elucidate the distributional (in-)stability of an estimator with respect to certain shifts and give more accurate estimates of parameters under shifted distribution only requiring to collect limited information from the shifted distribution.


A New Distribution on the Simplex with Auto-Encoding Applications

Stirn, Andrew, Jebara, Tony, Knowles, David

Neural Information Processing Systems

We construct a new distribution for the simplex using the Kumaraswamy distribution and an ordered stick-breaking process. We explore and develop the theoretical properties of this new distribution and prove that it exhibits symmetry (exchangeability) under the same conditions as the well-known Dirichlet. Like the Dirichlet, the new distribution is adept at capturing sparsity but, unlike the Dirichlet, has an exact and closed form reparameterization--making it well suited for deep variational Bayesian modeling. We demonstrate the distribution's utility in a variety of semi-supervised auto-encoding tasks. In all cases, the resulting models achieve competitive performance commensurate with their simplicity, use of explicit probability models, and abstinence from adversarial training.


Weights & Biases - ML Best Practices: Test Driven Development at Latent Space

#artificialintelligence

I sat down with the Latent Space team to talk about best practices around collaboration and managing model iteration. In machine learning, bugs may affect the distribution of possible models more than any particular instance, making traditional deterministic tests misleading. Because of this, a test-driven development framework for large ML models must account for the statistical nature of training. This is especially crucial when multiple researchers and engineers are contributing to the same model, as it's easy to silently introduce regressions into a codebase. Here, the team shares some insights about how this new form of test-driven development has been the key to moving quickly on a large-scale collaborative project.